AITopics | global action

d9c7c8bd6ad4cebb7d006e5109e0b682-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 23:33:30 GMT

artificial intelligence, factor graph, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Communications > Networks (0.93)

Add feedback

Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits

Neural Information Processing SystemsApr-29-2026, 23:33:25 GMT

We study the problem of regret minimization in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph. We derive an instance-specific regret lower bound and characterize the minimal expected number of times each global action should be explored. This bound and the corresponding optimal exploration process are obtained by solving a combinatorial optimization problem whose set of variables and constraints exponentially grow with the number of agents, and cannot be exploited in the design of efficient algorithms. Inspired by Mean Field approximation techniques used in graphical models, we provide simple upper bounds of the regret lower bound. The corresponding optimization problems have a reduced number of variables and constraints. By tuning the latter, we may explore the trade-off between the achievable regret and the complexity of computing the corresponding exploration process. We devise Efficient Sampling for MAMAB (ESM), an algorithm whose regret asymptotically matches the approximated lower bounds. The regret and computational complexity of ESM are assessed numerically, using both synthetic and real-world experiments in radio communications networks.

algorithm, artificial intelligence, optimization problem, (15 more...)

Neural Information Processing Systems

Country: Europe > Sweden (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)

Add feedback

d9c7c8bd6ad4cebb7d006e5109e0b682-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 10:31:44 GMT

artificial intelligence, factor graph, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Communications > Networks (0.93)

Add feedback

d9c7c8bd6ad4cebb7d006e5109e0b682-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 10:31:40 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.96)
Information Technology > Communications > Networks (0.68)

Add feedback

An Investigation of Offline Reinforcement Learning in Factorisable Action Spaces

Beeson, Alex, Ireland, David, Montana, Giovanni

arXiv.org Machine LearningNov-17-2024

Expanding reinforcement learning (RL) to offline domains generates promising prospects, particularly in sectors where data collection poses substantial challenges or risks. Pivotal to the success of transferring RL offline is mitigating overestimation bias in value estimates for state-action pairs absent from data. Whilst numerous approaches have been proposed in recent years, these tend to focus primarily on continuous or small-scale discrete action spaces. Factorised discrete action spaces, on the other hand, have received relatively little attention, despite many real-world problems naturally having factorisable actions. In this work, we undertake a formative investigation into offline reinforcement learning in factorisable action spaces. Using value-decomposition as formulated in DecQN as a foundation, we present the case for a factorised approach and conduct an extensive empirical evaluation of several offline techniques adapted to the factorised setting. In the absence of established benchmarks, we introduce a suite of our own comprising datasets of varying quality and task complexity. Advocating for reproducible research and innovation, we make all datasets available for public use alongside our code base.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2411.11088

Country:

North America > United States > Montana (0.05)
Europe > Ireland (0.05)
Europe > United Kingdom > England > West Midlands > Coventry (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.46)
Education (0.46)
Transportation (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Goal-conditioned dual-action imitation learning for dexterous dual-arm robot manipulation

Kim, Heecheol, Ohmura, Yoshiyuki, Kuniyoshi, Yasuo

arXiv.org Artificial IntelligenceMar-19-2024

Long-horizon dexterous robot manipulation of deformable objects, such as banana peeling, is a problematic task because of the difficulties in object modeling and a lack of knowledge about stable and dexterous manipulation skills. This paper presents a goal-conditioned dual-action (GC-DA) deep imitation learning (DIL) approach that can learn dexterous manipulation skills using human demonstration data. Previous DIL methods map the current sensory input and reactive action, which often fails because of compounding errors in imitation learning caused by the recurrent computation of actions. The method predicts reactive action only when the precise manipulation of the target object is required (local action) and generates the entire trajectory when precise manipulation is not required (global action). This dual-action formulation effectively prevents compounding error in the imitation learning using the trajectory-based global action while responding to unexpected changes in the target object during the reactive local action. The proposed method was tested in a real dual-arm robot and successfully accomplished the banana-peeling task.

global action, robot, subtask, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TRO.2024.3372778

2203.09749

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

Zhao, Tianyu, Taherisadr, Mojtaba, Elmalaki, Salma

arXiv.org Artificial IntelligenceNov-6-2023

Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to variations in human preferences resulting from inter- and intra-human variability. This paper addresses the fairness problem from an equity lens, considering human behavior variability, and the changes in human preferences over time. We propose FAIRO, a novel algorithm for fairness-aware sequential-decision making in HITL adaptation, which incorporates these notions into the decision-making process. In particular, FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences through leveraging the Options reinforcement learning framework. We design FAIRO to generalize to three types of HITL application setups that have the shared adaptation decision problem. Furthermore, we recognize that fairness-aware policies can sometimes conflict with the application's utility. To address this challenge, we provide a fairness-utility tradeoff in FAIRO, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO can improve fairness compared with other methods across all three applications by 35.36%.

bit 0, fairness, fairo, (16 more...)

arXiv.org Artificial Intelligence

2307.05857

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine (1.00)
Education (1.00)
Construction & Engineering > HVAC (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Global Policy Construction in Modular Reinforcement Learning

Zhang, Ruohan (The University of Texas at Austin) | Song, Zhao (The University of Texas at Austin) | Ballard, Dana H. (The University of Texas at Austin)

AAAI ConferencesMar-6-2015

We propose a modular reinforcement learning algorithm which decomposes a Markov decision process into independent modules. Each module is trained using Sarsa(lambda). We introduce three algorithms for forming global policy from modules policies, and demonstrate our results using a 2D grid world.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas > Travis County > Austin (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Backdoors to Planning

Kronegger, Martin (Vienna University of Technology) | Ordyniak, Sebastian (Masaryk University Brno) | Pfandler, Andreas (Vienna University of Technology)

AAAI ConferencesJul-14-2014

Backdoors measure the distance to tractable fragments and have become an important tool to find fixed-parameter tractable (fpt) algorithms. Despite their success, backdoors have not been used for planning, a central problem in AI that has a high computational complexity. In this work, we introduce two notions of backdoors building upon the causal graph. We analyze the complexity of finding a small backdoor (detection) and using the backdoor to solve the problem (evaluation) in the light of planning with (un)bounded plan length/domain of the variables. For each setting we present either an fpt-result or rule out the existence thereof by showing parameterized intractability. In three cases we achieve the most desirable outcome: detection and evaluation are fpt.

ausal, backdoor, complexity, (16 more...)

AAAI Conferences

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: